Skip to content

Add Z Image LoRA fine tuning support#1127

Closed
ParamThakkar123 wants to merge 49 commits intomainfrom
add/z-image-ft
Closed

Add Z Image LoRA fine tuning support#1127
ParamThakkar123 wants to merge 49 commits intomainfrom
add/z-image-ft

Conversation

@ParamThakkar123
Copy link
Contributor

@ParamThakkar123 ParamThakkar123 commented Dec 23, 2025

Summary by CodeRabbit

  • New Features

    • Z-Image pipeline: end-to-end training & inference, prompt encoding, dataset/collation support, LoRA compatibility, and tailored save/load behavior.
    • Smarter model handling: local vs remote model resolution with runtime fallback and filtering of unsupported generation options.
  • Chores

    • Plugin bumped to 0.1.11.
    • Installer made more robust with clearer install flow and graceful handling of optional components.
  • Refactor

    • Startup/config flow streamlined to prefer plugin-provided runtime libraries and configurable env var sourcing.
  • Tests

    • Added tests for model resolution and pipeline kwargs filtering.

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 1.04167% with 190 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
api/transformerlab/plugins/image_diffusion/main.py 1.56% 126 Missing ⚠️
api/transformerlab/plugin_sdk/plugin_harness.py 0.00% 64 Missing ⚠️

📢 Thoughts on this report? Let us know!

@ParamThakkar123
Copy link
Contributor Author

I am getting this error while running fine tuning task with any diffusion model on the diffusion trainer plugin:

terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create

I tried fine tuning SDXL and other stable diffusion models but got this error on every run

@deep1401
Copy link
Member

I am getting this error while running fine tuning task with any diffusion model on the diffusion trainer plugin:

terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_create

I tried fine tuning SDXL and other stable diffusion models but got this error on every run

I had this error once, updating timm resolved it. But it may or may not help in your case

Copy link
Member

@dadmobile dadmobile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to running. Let's spend some time this week syncing on what's required and then we can do a patch release and announce this!

When trying to generate with Z Image Turbo I kept getting some vague error I will have to debug.

When trying to run the train I would get:

Error in Job: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'
Traceback (most recent call last):
  File "/home/azureuser/transformerlab-app/api/transformerlab/plugin_sdk/transformerlab/sdk/v1/tlab_plugin.py", line 105, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/main.py", line 818, in train_diffusion_lora
    noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/diffusers/configuration_utils.py", line 144, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'FlowMatchEulerDiscreteScheduler' object has no attribute 'add_noise'

Copy link
Member

@deep1401 deep1401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've let this sit long enough that we're getting version errors now. Running this I get errors like these which you might want to look at:

Using default home directory: /home/transformerlab/.transformerlab
Error executing plugin: Could not import module 'BloomPreTrainedModel'. Are this object's requirements defined correctly?
Traceback (most recent call last):
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2317, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2347, in _get_module
    raise e
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 2345, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/envs/transformerlab/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/models/bloom/modeling_bloom.py", line 29, in <module>
    from ...modeling_layers import GradientCheckpointingLayer
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/modeling_layers.py", line 28, in <module>
    from .processing_utils import Unpack
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/processing_utils.py", line 37, in <module>
    from .image_utils import ChannelDimension, ImageInput, is_vision_available
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/transformers/image_utils.py", line 55, in <module>
    from torchvision.transforms import InterpolationMode
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 1073, in register
    use_lib._register_fake(
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/library.py", line 203, in _register_fake
    handle = entry.fake_impl.register(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/transformerlab/.transformerlab/orgs/05da06f3-1a86-49e4-a511-f100705fa6f9/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Z-Image pipeline support across training and inference (loading, tokenizer/prompt encoding, FlowMatchSFTLoss, LoRA handling and save paths), bumps diffusion_trainer plugin version, expands installer script, adds model-reference resolution and generation-kwargs filtering for image pipelines, and introduces runtime/config helpers and related tests.

Changes

Cohort / File(s) Summary
Plugin Manifest
api/transformerlab/plugins/diffusion_trainer/index.json
Bumped version 0.1.10 → 0.1.11 and added ZImagePipeline to model_architectures.
Diffusion Trainer (Z-Image)
api/transformerlab/plugins/diffusion_trainer/main.py
Added build_zimage_model_configs() and encode_prompt_zimage(); detect/instantiate ZImagePipeline; Z-Image-specific model/tokenizer loading, device/dtype and selective parameter freezing, FlowMatchSFTLoss integration, LoRA/PEFT handling for Z-Image, dataset collation/preprocessing updates (prompts, original_sizes, crops), and expanded LoRA save logic (safetensors/PyTorch fallback, async adaptor-info write).
Installer Script
api/transformerlab/plugins/diffusion_trainer/setup.sh
Reworked install script: shebang, combined dependency install (diffusers, transformers, peft, diffsynth), improved xformers guard and non-fatal error handling, clearer messaging and no-fail optional installs.
Image Diffusion — Model Resolution & Kwarg Filtering
api/transformerlab/plugins/image_diffusion/main.py, api/transformerlab/plugins/image_diffusion/diffusion_worker.py
Added _is_probable_hf_repo_id, _extract_hf_repo_from_model_metadata, resolve_diffusion_model_reference to prefer local model dirs or fall back to HF repo ids; integrated resolution into pipeline loading/sharding/device-map; added filter_generation_kwargs_for_pipeline and applied it before pipeline invocation.
Plugin Harness / Runtime Config
api/transformerlab/plugin_sdk/plugin_harness.py
Added get_db_config_value() for SQLite-config lookups, configure_plugin_runtime_library_paths() to prefer plugin venv CUDA/NCCL libs via LD_LIBRARY_PATH; made set_config_env_vars() params optional and invoked runtime path configuration on startup.
Tests
api/test/api/test_diffusion.py
Added tests for model reference resolution (directory vs HF repo fallback) and for filtering generation kwargs against pipeline signatures.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Trainer as Training Pipeline
    participant Loader as ModelConfig Loader
    participant ZPipe as ZImagePipeline
    participant Tokenizer as Z-Image Tokenizer
    participant Loss as FlowMatchSFTLoss
    participant Saver as LoRA Saver

    User->>Trainer: start Z-Image training
    Trainer->>Loader: build_zimage_model_configs(model_path)
    Loader-->>Trainer: model & tokenizer configs
    Trainer->>ZPipe: instantiate pipeline (device / dtype / freeze parts)
    ZPipe-->>Trainer: pipeline ready

    Trainer->>Tokenizer: encode_prompt_zimage(prompts)
    Tokenizer-->>Trainer: prompt embeddings

    Trainer->>Loss: forward(batch, embeddings, sizes/crops)
    Loss-->>Trainer: loss
    Trainer->>Saver: save LoRA (safetensors or fallback)
    Saver-->>User: checkpoint saved
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"I nibble on tokens in moonlit code,
Z-Image branches where new paths go,
prompts curled, LoRA threads entwined,
checkpoints saved, configs aligned,
a rabbit hops — the model grows." 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Z Image LoRA fine tuning support' accurately and directly describes the primary purpose of the PR—introducing Z Image LoRA fine-tuning capabilities across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add/z-image-ft

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
api/transformerlab/plugins/diffusion_trainer/main.py (1)

570-586: Remove duplicate VAE xFormers enablement.

The VAE call is executed twice for non‑ZImage paths. Keep a single guarded call.

♻️ Suggested cleanup
-            if hasattr(vae, "enable_xformers_memory_efficient_attention"):
-                vae.enable_xformers_memory_efficient_attention()
-            if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"):
+            if not is_zimage and hasattr(vae, "enable_xformers_memory_efficient_attention"):
                 vae.enable_xformers_memory_efficient_attention()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 570 - 586,
The VAE's enable_xformers_memory_efficient_attention is being called twice in
the xFormers enable block; remove the duplicate call so the code only invokes
vae.enable_xformers_memory_efficient_attention() once and guard it with
hasattr(vae, "enable_xformers_memory_efficient_attention") and the is_zimage
check as appropriate (use unet.enable_xformers_memory_efficient_attention() and
a single conditional call to vae.enable_xformers_memory_efficient_attention()
when available and when not is_zimage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 62-95: In build_zimage_model_configs, validate the local-path glob
results (transformer_paths, text_encoder_paths, vae_paths, and tokenizer path)
when model_id_or_path is a directory: if any of these lists/paths are empty or
missing, raise a clear ValueError describing which component is missing instead
of letting downstream opaque errors occur; keep using ModelConfig for each
component but fail fast with an explicit message naming the missing asset (e.g.,
"missing transformer files", "missing text_encoder files", "missing vae file",
or "missing tokenizer directory") so callers can immediately act.
- Around line 453-483: When defaulting Z-Image to BF16 in the mixed-precision
logic, guard that choice with an actual hardware BF16 support check: inside the
block that sets weight_dtype based on is_zimage and mixed_precision (referencing
is_zimage, mixed_precision, weight_dtype, device), only set weight_dtype =
torch.bfloat16 if CUDA is available and torch.cuda.is_bf16_supported() returns
True; otherwise fall back to torch.float32 (or respect explicit "bf16" request).
Ensure the check runs before assigning weight_dtype so non-BF16 GPUs/CPUs won't
get bfloat16 by default.
- Around line 491-553: The build currently relies on diffsynth APIs used around
ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in
main.py), but setup.sh installs diffsynth without a version pin; update setup.sh
to pin diffsynth to a compatible minimum/locked version (e.g., change the
install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline
code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related
behavior) remains stable across environments.
- Around line 1005-1018: The code references an unreleased class
FlowMatchSFTLoss (imported from diffsynth.diffusion.loss) which isn't available
in public diffsynth v2.0.4; update the codebase and dependency declarations:
either replace FlowMatchSFTLoss usage in main.py (around the is_zimage branch
where input_latents, prompt_embeds, vae_encoder, and encode_prompt_zimage are
used) with a public, supported loss class or vendor the missing implementation,
and then pin and document the exact diffsynth fork/commit or custom package in
requirements.txt or pyproject.toml; also ensure the replacement/venor returns a
PyTorch tensor (compatible with the .item() call) and preserve the
gradient_checkpointing flags (use_gradient_checkpointing and
use_gradient_checkpointing_offload) so runtime behavior remains consistent.

In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 3-7: Update the PEFT requirement from "peft>=0.15.0" to
"peft>=0.17.0" in the shell install line (replace the existing uv pip install
"peft>=0.15.0" diffsynth command with uv pip install "peft>=0.17.0" diffsynth)
and also adjust the PEFT version constraint in the project-level pyproject.toml
optional dependencies entries so they no longer pin to 0.14.0/0.15.2 but allow
>=0.17.0, ensuring consistency with diffusers 0.36.0 and other diffusion
plugins.

---

Nitpick comments:
In `@api/transformerlab/plugins/diffusion_trainer/main.py`:
- Around line 570-586: The VAE's enable_xformers_memory_efficient_attention is
being called twice in the xFormers enable block; remove the duplicate call so
the code only invokes vae.enable_xformers_memory_efficient_attention() once and
guard it with hasattr(vae, "enable_xformers_memory_efficient_attention") and the
is_zimage check as appropriate (use
unet.enable_xformers_memory_efficient_attention() and a single conditional call
to vae.enable_xformers_memory_efficient_attention() when available and when not
is_zimage).

Comment on lines +491 to 553
pipe = None
if is_zimage:
# Ensure the model is downloaded locally if it's not already a directory
if not os.path.isdir(pretrained_model_name_or_path):
from huggingface_hub import snapshot_download

print(f"Downloading Z-Image model {pretrained_model_name_or_path} from Hugging Face...")
pretrained_model_name_or_path = snapshot_download(
repo_id=pretrained_model_name_or_path,
allow_patterns=["*.safetensors", "*.json", "tokenizer/*"],
)
print(f"Model downloaded to: {pretrained_model_name_or_path}")

# Extract components from the loaded pipeline
noise_scheduler = temp_pipeline.scheduler
tokenizer = temp_pipeline.tokenizer
text_encoder = temp_pipeline.text_encoder
vae = temp_pipeline.vae
model_configs, tokenizer_config = build_zimage_model_configs(pretrained_model_name_or_path)
pipe = ZImagePipeline.from_pretrained(
torch_dtype=weight_dtype,
device=device,
model_configs=model_configs,
tokenizer_config=tokenizer_config,
)

# Handle different architectures: FluxPipeline uses 'transformer', others use 'unet'
# We use 'unet' as a unified variable name for the main model component regardless of architecture
if hasattr(temp_pipeline, "transformer"):
# FluxPipeline and other transformer-based models
unet = temp_pipeline.transformer
model_component_name = "transformer"
pipe.scheduler.set_timesteps(int(args.get("num_train_timesteps", 1000)), training=True)
noise_scheduler = pipe.scheduler
tokenizer = pipe.tokenizer
text_encoder = pipe.text_encoder
vae_encoder = pipe.vae_encoder
vae_decoder = pipe.vae_decoder
unet = pipe.dit
model_component_name = "dit"
text_encoder_2 = None
tokenizer_2 = None
vae = None
else:
# SD 1.x, SDXL, SD3 and other UNet-based models
unet = temp_pipeline.unet
model_component_name = "unet"
temp_pipeline = AutoPipelineForText2Image.from_pretrained(pretrained_model_name_or_path, **pipeline_kwargs)

# Extract components from the loaded pipeline
noise_scheduler = temp_pipeline.scheduler
tokenizer = temp_pipeline.tokenizer
text_encoder = temp_pipeline.text_encoder
vae = temp_pipeline.vae

# Handle different architectures: FluxPipeline uses 'transformer', others use 'unet'
# We use 'unet' as a unified variable name for the main model component regardless of architecture
if hasattr(temp_pipeline, "transformer"):
# FluxPipeline and other transformer-based models
unet = temp_pipeline.transformer
model_component_name = "transformer"
else:
# SD 1.x, SDXL, SD3 and other UNet-based models
unet = temp_pipeline.unet
model_component_name = "unet"

# Handle SDXL case with dual text encoders
text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None)
tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None)
# Handle SDXL case with dual text encoders
text_encoder_2 = getattr(temp_pipeline, "text_encoder_2", None)
tokenizer_2 = getattr(temp_pipeline, "tokenizer_2", None)

# Clean up temporary pipeline
del temp_pipeline
torch.cuda.empty_cache() if torch.cuda.is_available() else None
# Clean up temporary pipeline
del temp_pipeline
torch.cuda.empty_cache() if torch.cuda.is_available() else None

print(f"Model components loaded successfully: {pretrained_model_name_or_path}")
print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}")
print(f"Model components loaded successfully: {pretrained_model_name_or_path}")
print(f"Architecture detected - Model component ({model_component_name}): {type(unet).__name__}")
if text_encoder_2 is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find dependency files to see diffsynth version pinning
find . -type f \( -name "requirements*.txt" -o -name "pyproject.toml" -o -name "setup.py" -o -name "poetry.lock" -o -name "Pipfile" \) | head -20

Repository: transformerlab/transformerlab-app

Length of output: 216


🏁 Script executed:

# Search for diffsynth imports in the codebase to understand usage
rg "diffsynth" --type py -i | head -30

Repository: transformerlab/transformerlab-app

Length of output: 378


🏁 Script executed:

# Look for the specific file mentioned in the review
head -50 api/transformerlab/plugins/diffusion_trainer/main.py

Repository: transformerlab/transformerlab-app

Length of output: 1472


🏁 Script executed:

# Check the pyproject.toml for diffsynth version
cat api/pyproject.toml | grep -A 50 "dependencies"

Repository: transformerlab/transformerlab-app

Length of output: 2066


🌐 Web query:

diffsynth ZImagePipeline from_pretrained API documentation

💡 Result:

diffsynth.pipelines.z_image.ZImagePipeline.from_pretrained is defined in the DiffSynth-Studio repo here: diffsynth/pipelines/z_image.py. It has this signature (as of main):

  • from_pretrained(torch_dtype=torch.bfloat16, device=get_device_type(), model_configs: list[ModelConfig]=[], tokenizer_config: ModelConfig=ModelConfig(model_id="Tongyi-MAI/Z-Image-Turbo", origin_file_pattern="tokenizer/"), vram_limit: float=None, enable_npu_patch: bool=True) [1]

Parameter meaning (project docs/examples):

  • torch_dtype: compute dtype used by the pipeline (not necessarily how weights are stored/offloaded). [2]
  • device: pipeline execution device (e.g. "cuda"). [1]
  • model_configs: list of ModelConfig entries describing what to download/load (by model_id + origin_file_pattern, or local path, and optionally VRAM/offload settings). [2][3]
  • tokenizer_config: ModelConfig pointing to the tokenizer folder (default is Z-Image-Turbo tokenizer). [1][3]
  • vram_limit: enables VRAM-management behavior during download_and_load_models(...) (used in official examples as “available VRAM (GiB) minus some margin”). [1][3]
  • enable_npu_patch: applies an NPU compatibility patch during init. [1]

Example usage (Z-Image Turbo quick start): uses model_configs=[ModelConfig(... transformer ...), ModelConfig(... text_encoder ...), ModelConfig(... vae ...)] plus tokenizer_config=ModelConfig(..., origin_file_pattern="tokenizer/") and an optional vram_limit=.... [3]

Related docs pointer: the general “Loading Models” section explains from_pretrained, ModelConfig, remote vs local loading, and env vars like DIFFSYNTH_MODEL_BASE_PATH / DIFFSYNTH_SKIP_DOWNLOAD. [2]

Sources: [1] [2] [3]


🌐 Web query:

diffsynth scheduler set_timesteps training parameter

💡 Result:

In DiffSynth(-Studio) the meaning of the training parameter depends on which scheduler you’re using:

FlowMatchScheduler (diffsynth/schedulers/flow_match.py)

set_timesteps is defined as:

set_timesteps(num_inference_steps=100, denoising_strength=1.0, training=False, shift=None)
  • training=False (default): only builds self.sigmas and self.timesteps.
  • training=True: additionally computes self.linear_timesteps_weights, which is later consumed by training_weight(timestep) for per-timestep loss weighting. (huggingface.co)

So training is not “how many training timesteps”—it’s a flag to enable training-time timestep weights.

Typical usage for training with FlowMatch:

  • Call scheduler.set_timesteps(N, training=True) where N is the number of discrete timesteps you want to sample from (often N=1000 to match the model’s num_train_timesteps scale).
  • Sample an index/value from scheduler.timesteps, use add_noise(...), and optionally weight the loss via scheduler.training_weight(timestep). (huggingface.co)

EnhancedDDIMScheduler (DDIM-style training scripts)

Some DiffSynth training code resets the scheduler and calls:

  • scheduler.set_timesteps(1000)

Here, there is no training= flag involved; it’s just setting the timestep grid length (often to 1000). (gitee.com)

Citations:


🏁 Script executed:

# Check if diffsynth is listed in a separate requirements file for the plugin
find api/transformerlab/plugins/diffusion_trainer -name "*.txt" -o -name "*.toml" | xargs cat 2>/dev/null

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check for any setup files or configuration in the diffusion_trainer directory
ls -la api/transformerlab/plugins/diffusion_trainer/

Repository: transformerlab/transformerlab-app

Length of output: 432


🏁 Script executed:

# Search the entire codebase for where diffsynth dependency might be specified
rg "diffsynth" --type toml --type txt

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check what's in setup.sh - this likely manages plugin dependencies
cat api/transformerlab/plugins/diffusion_trainer/setup.sh

Repository: transformerlab/transformerlab-app

Length of output: 398


🏁 Script executed:

# Also check if there's any documentation about how plugin dependencies are managed
find . -name "setup.sh" -exec cat {} \; | head -100

Repository: transformerlab/transformerlab-app

Length of output: 4920


Pin diffsynth version to ensure API compatibility across environments.

The code correctly uses ZImagePipeline.from_pretrained(device=..., model_configs=..., tokenizer_config=...) and scheduler.set_timesteps(..., training=True), which are valid in current diffsynth versions. However, setup.sh installs diffsynth without a version constraint, whereas peft is pinned to >=0.15.0. Without version pinning, future diffsynth releases could introduce breaking API changes. Add a version constraint (e.g., "diffsynth>=0.X.Y") to setup.sh to guarantee consistent behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 491 - 553,
The build currently relies on diffsynth APIs used around
ZImagePipeline.from_pretrained and pipe.scheduler.set_timesteps (seen in
main.py), but setup.sh installs diffsynth without a version pin; update setup.sh
to pin diffsynth to a compatible minimum/locked version (e.g., change the
install spec to diffsynth>=0.X.Y or a specific tested release) so the pipeline
code (ZImagePipeline.from_pretrained, scheduler.set_timesteps, and related
behavior) remains stable across environments.

Comment on lines +1005 to 1018
if is_zimage:
pixel_values = batch["pixel_values"].to(device, dtype=weight_dtype)
input_latents = vae_encoder(pixel_values)
prompt_embeds = encode_prompt_zimage(pipe, batch["prompt"], device)

loss = FlowMatchSFTLoss(
pipe,
input_latents=input_latents,
prompt_embeds=prompt_embeds,
image_embeds=None,
image_latents=None,
use_gradient_checkpointing=args.get("gradient_checkpointing", False),
use_gradient_checkpointing_offload=False,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find where FlowMatchSFTLoss is defined or imported
rg "FlowMatchSFTLoss" --type py -B 2 -A 2

Repository: transformerlab/transformerlab-app

Length of output: 975


🏁 Script executed:

# Also check the specific file to see imports
head -100 api/transformerlab/plugins/diffusion_trainer/main.py | grep -E "^import|^from"

Repository: transformerlab/transformerlab-app

Length of output: 895


🏁 Script executed:

# Check if it's defined in the file itself
rg "class FlowMatchSFTLoss" --type py -A 20

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check if diffsynth is vendored or installed
find . -name "*diffsynth*" -type d | head -20

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check requirements files for diffsynth version
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" | xargs grep -l "diffsynth" 2>/dev/null

Repository: transformerlab/transformerlab-app

Length of output: 61


🏁 Script executed:

# Search for FlowMatchSFTLoss in the entire repo
rg "class FlowMatchSFTLoss" --type py

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check for any diffsynth submodules or vendored code
ls -la | grep -i diffsynth || echo "No top-level diffsynth directory"

Repository: transformerlab/transformerlab-app

Length of output: 109


🌐 Web query:

diffsynth FlowMatchSFTLoss class signature parameters return type

💡 Result:

I couldn’t find any public FlowMatchSFTLoss class in the searchable DiffSynth-Studio sources/docs or the PyPI diffsynth project page (v2.0.4, released Jan 30, 2026). The only clearly indexed “FlowMatch*” API I could locate is FlowMatchScheduler (scheduler implementation), not an SFT loss class. [1], [2]

Because FlowMatchSFTLoss doesn’t appear in public docs/search results, I can’t reliably state its class signature, __init__ parameters, or forward/return type from web sources.

How to get the exact signature from your installed diffsynth:

import inspect
from diffsynth import FlowMatchSFTLoss  # or its real import path if different

print(FlowMatchSFTLoss)
print(inspect.signature(FlowMatchSFTLoss.__init__))
print(inspect.signature(FlowMatchSFTLoss.forward))
print(inspect.getsource(FlowMatchSFTLoss.forward))

If you paste the import path you’re using (or the traceback showing where FlowMatchSFTLoss is defined), I can tell you the exact parameters and return type from that definition.

Sources:
[1] PyPI project page for diffsynth (v2.0.4, Jan 30, 2026) (pypi.org)
[2] DiffSynth-Studio docs mentioning FlowMatchScheduler (diffsynth-studio.readthedocs.io)

Citations:


🏁 Script executed:

# Check if there's a local diffsynth implementation in the codebase
rg "from diffsynth.diffusion" --type py -l

Repository: transformerlab/transformerlab-app

Length of output: 129


🏁 Script executed:

# Search for any loss-related classes defined locally
rg "class.*Loss" --type py | grep -i "flowmatch\|sft"

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check if diffsynth.diffusion.loss is implemented anywhere locally
find . -path "*/diffsynth/diffusion/loss*" -type f

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check the git history to see if this was recently added
git log --oneline --all -- "*FlowMatchSFTLoss*" 2>/dev/null | head -10

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check what happens with the 'loss' variable after it's created
sed -n '1005,1050p' api/transformerlab/plugins/diffusion_trainer/main.py

Repository: transformerlab/transformerlab-app

Length of output: 2444


🏁 Script executed:

# Check for requirements to see diffsynth version constraint
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | xargs cat 2>/dev/null | grep -i diffsynth

Repository: transformerlab/transformerlab-app

Length of output: 59


🏁 Script executed:

# Check if there's a git submodule or fork
cat .gitmodules 2>/dev/null | grep -i diffsynth || echo "No submodule found"

Repository: transformerlab/transformerlab-app

Length of output: 95


🏁 Script executed:

# Check for any notes or documentation about diffsynth version
find . -name "*.md" -o -name "*.txt" | xargs grep -l "FlowMatchSFTLoss\|diffsynth.*version" 2>/dev/null | head -5

Repository: transformerlab/transformerlab-app

Length of output: 59


FlowMatchSFTLoss is not found in any public diffsynth version—verify dependency.

FlowMatchSFTLoss does not exist in the publicly available diffsynth library (v2.0.4, latest on PyPI). The code imports from diffsynth.diffusion.loss, but this class is undocumented and unreleased. Confirm that your development environment uses a specific diffsynth fork, branch, or unreleased version, and document this dependency in requirements.txt or pyproject.toml to prevent deployment failures. The usage pattern (.item() at line 1019) indicates it should return a PyTorch tensor, but this callable contract cannot be verified without access to the actual implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/diffusion_trainer/main.py` around lines 1005 -
1018, The code references an unreleased class FlowMatchSFTLoss (imported from
diffsynth.diffusion.loss) which isn't available in public diffsynth v2.0.4;
update the codebase and dependency declarations: either replace FlowMatchSFTLoss
usage in main.py (around the is_zimage branch where input_latents,
prompt_embeds, vae_encoder, and encode_prompt_zimage are used) with a public,
supported loss class or vendor the missing implementation, and then pin and
document the exact diffsynth fork/commit or custom package in requirements.txt
or pyproject.toml; also ensure the replacement/venor returns a PyTorch tensor
(compatible with the .item() call) and preserve the gradient_checkpointing flags
(use_gradient_checkpointing and use_gradient_checkpointing_offload) so runtime
behavior remains consistent.

Comment on lines +3 to +7
# Install compatible torch and torchvision first to avoid version conflicts
uv pip install torch torchvision diffusers transformers --extra-index-url https://download.pytorch.org/whl/cu118

# Install PEFT and diffsynth
uv pip install "peft>=0.15.0" diffsynth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

torch diffusers peft transformers compatibility requirements 2025

💡 Result:

Practical compatibility requirements (late‑2025 / 2026-era HF stack)

Diffusers

  • diffusers v0.36.0 is tested on Python 3.8+ and PyTorch 1.4+ and the docs explicitly install it alongside transformers (e.g. diffusers["torch"] transformers). [1]

Diffusers ↔ PEFT (LoRA/adapters)

  • If you install Diffusers from git / dev builds (e.g., 0.36.0.dev0), Diffusers can enforce peft>=0.17.0; a real-world failure report shows an import error when peft==0.7.0 is present and Diffusers requires peft>=0.17.0. [2]

Transformers ↔ PEFT

  • Transformers v5 is incompatible with peft<0.18.0; if you plan to use Transformers v5, use peft>=0.18.0. [3]
  • PEFT 0.18.0+ also requires Python 3.10+ (drops 3.9). [3]

Transformers ↔ huggingface-hub (common resolver trap)

  • Recent transformers (example given: 4.57.3, as of 2026‑01‑05) still requires huggingface-hub<1.0; letting your environment pick huggingface-hub==1.x can force downgrades/breakage. [4]

A “safe” 2025-ish pinning strategy (what the above implies)

  • If staying on Transformers 4.x: use diffusers>=0.36, transformers~=4.5x, peft>=0.17, and ensure huggingface-hub<1.0. [1][2][4]
  • If moving to Transformers 5.x: use peft>=0.18.0 (and therefore Python 3.10+), plus whatever Diffusers release notes recommend for that Diffusers version. [3]

Sources: Diffusers installation docs v0.36.0 [1]; Diffusers issue documenting peft>=0.17.0 requirement on dev installs [2]; PEFT 0.18.0 release notes on Transformers v5 + Python requirement [3]; dependency note on transformers requiring huggingface-hub<1.0 [4].


🏁 Script executed:

# Search for requirements files and dependency specifications in the project
find . -type f \( -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "constraints*.txt" \) | head -20

Repository: transformerlab/transformerlab-app

Length of output: 216


🏁 Script executed:

# Check the setup.sh file and surrounding context
cat -n api/transformerlab/plugins/diffusion_trainer/setup.sh

Repository: transformerlab/transformerlab-app

Length of output: 482


🏁 Script executed:

# Look for other setup or installation documentation
find . -type f \( -name "*.md" -o -name "*.txt" \) | xargs grep -l "torch\|diffusers\|peft" 2>/dev/null | head -10

Repository: transformerlab/transformerlab-app

Length of output: 302


🏁 Script executed:

# Check the actual pyproject.toml for dependency management
cat -n api/pyproject.toml | head -100

Repository: transformerlab/transformerlab-app

Length of output: 3219


🏁 Script executed:

# Search for torch/base plugin setup references
rg -n "torch|base.plugin|base_plugin|plugin.*venv" --type py --type toml --type sh api/

Repository: transformerlab/transformerlab-app

Length of output: 50389


🏁 Script executed:

# Check if there's a parent/base setup script
find api/transformerlab/plugins -name "setup.sh" -o -name "base*.sh" | xargs cat -n

Repository: transformerlab/transformerlab-app

Length of output: 19659


🏁 Script executed:

# Check if other plugins use version pinning or rely on project levels
rg -A2 "uv pip install" api/transformerlab/plugins/*/setup.sh | grep -E "(peft|diffusers|transformers)" | head -30

Repository: transformerlab/transformerlab-app

Length of output: 2168


🏁 Script executed:

# Verify the actual pyproject.toml peft versions for all platforms
grep -A5 "peft==" api/pyproject.toml

Repository: transformerlab/transformerlab-app

Length of output: 529


Increase peft version constraint to >=0.17.0 for compatibility with diffusers 0.36.0.

The specified peft>=0.15.0 is below the minimum required by diffusers 0.36.0 (which requires peft>=0.17.0 per Hugging Face release notes). This conflicts with other diffusion-based plugins in the codebase (llama_trainer, image_diffusion, fastchat_server) which correctly use peft>=0.17.0. Additionally, the project-level pyproject.toml pins peft to 0.14.0 (nvidia/cpu) and 0.15.2 (rocm), which will force a downgrade and create a version mismatch at runtime.

Suggested fix
uv pip install "peft>=0.17.0" diffsynth

Verify this is also addressed in the project-level pyproject.toml optional dependencies.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/diffusion_trainer/setup.sh` around lines 3 - 7,
Update the PEFT requirement from "peft>=0.15.0" to "peft>=0.17.0" in the shell
install line (replace the existing uv pip install "peft>=0.15.0" diffsynth
command with uv pip install "peft>=0.17.0" diffsynth) and also adjust the PEFT
version constraint in the project-level pyproject.toml optional dependencies
entries so they no longer pin to 0.14.0/0.15.2 but allow >=0.17.0, ensuring
consistency with diffusers 0.36.0 and other diffusion plugins.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-119: The parameter target_env_var in set_config_env_vars is
declared Optional but later used as a dict key in os.environ[target_env_var],
which will raise if None; update set_config_env_vars to either require
target_env_var (change its type to str) or add a guard at the top of the
function (e.g., raise ValueError or return early if target_env_var is None)
before calling get_db_config_value and setting os.environ, referencing the
function name set_config_env_vars and the os.environ assignment to locate and
fix the issue.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/transformerlab/plugins/diffusion_trainer/setup.sh`:
- Around line 1-9: The install script exits successfully even if the core
dependency install fails; update the shell script (setup.sh) to fail fast by
enabling strict error handling (add set -euo pipefail at the top) or by
appending explicit failure checks to the uv pip install invocation (e.g., ensure
the "uv pip install" command in the script will exit non‑zero on failure and
propagate that by using || exit 1). Target the top of the script and the "uv pip
install" line to ensure dependency install failures surface immediately.

@ParamThakkar123
Copy link
Contributor Author

ParamThakkar123 commented Feb 18, 2026

I fixed this and tested on my azure vm

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/transformerlab/plugins/image_diffusion/main.py (1)

4-5: Consider centralizing model-reference helpers to avoid drift.
These utilities mirror the ones in api/transformerlab/plugins/image_diffusion/diffusion_worker.py; extracting them into a shared module would reduce duplication and keep behavior consistent.

Also applies to: 327-474

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugins/image_diffusion/main.py` around lines 4 - 5, The
model-reference helper functions duplicated between image_diffusion main and
diffusion_worker should be extracted into a single shared module (e.g.,
model_reference_helpers) and both modules should import those helpers instead of
keeping separate copies; locate the duplicated utilities in
image_diffusion/plugins/image_diffusion/main.py and
image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper
definitions into the new shared module, update both files to import the helpers
(removing the local copies and any redundant imports like inspect/Path if no
longer needed), and run/adjust any unit tests or usage sites to ensure the
unified helpers' API matches prior behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@api/transformerlab/plugins/image_diffusion/diffusion_worker.py`:
- Around line 12-221: The helper utilities in diffusion_worker.py (functions
_is_probable_hf_repo_id, _extract_hf_repo_from_model_metadata,
resolve_diffusion_model_reference, filter_generation_kwargs_for_pipeline)
duplicate logic from main.py; refactor by extracting these helpers into a single
shared module (e.g., a utilities or helpers module) and replace the local
implementations with imports from that module, updating any local references
(including imports used inside _extract_hf_repo_from_model_metadata such as
ModelService/asyncio) to use the centralized implementation so there is a single
source of truth and no duplicated code.

---

Nitpick comments:
In `@api/transformerlab/plugins/image_diffusion/main.py`:
- Around line 4-5: The model-reference helper functions duplicated between
image_diffusion main and diffusion_worker should be extracted into a single
shared module (e.g., model_reference_helpers) and both modules should import
those helpers instead of keeping separate copies; locate the duplicated
utilities in image_diffusion/plugins/image_diffusion/main.py and
image_diffusion/plugins/image_diffusion/diffusion_worker.py, move the helper
definitions into the new shared module, update both files to import the helpers
(removing the local copies and any redundant imports like inspect/Path if no
longer needed), and run/adjust any unit tests or usage sites to ensure the
unified helpers' API matches prior behavior.

# This avoids resolver-driven torch/torchvision drift (e.g. missing torchvision::nms).
uv pip install \
"diffusers==0.36.0" \
"transformers==4.57.1" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the standard diffusers and transformers version we are using. You shouldn't have to put this in here?
Or are you saying that you want to upgrade peft but doing so changes diffusers and transformers version? If so, do we just upgrade peft in pyproject.toml?
Or is it because of diffsynth?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/transformerlab/plugin_sdk/plugin_harness.py (1)

110-123: Log message doesn't distinguish global config from team config.

Line 121 prints "user" or "team", but when both user_id and team_id are None, the config is actually sourced from global scope. Consider updating for accuracy:

-            print(f"Set {target_key} from {'user' if user_id else 'team'} config")
+            source = "user" if user_id else ("team" if team_id else "global")
+            print(f"Set {target_key} from {source} config")

Similarly for line 123.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/transformerlab/plugin_sdk/plugin_harness.py` around lines 110 - 123, In
set_config_env_vars, the log messages use {'user' if user_id else 'team'} which
is incorrect when both user_id and team_id are None (global config); compute a
source string like source = 'user' if user_id else 'team' if team_id else
'global' and use that source variable in both the success print (after setting
os.environ[target_key]) and the exception warning so logs correctly show 'user',
'team', or 'global'; reference set_config_env_vars, target_key,
get_db_config_value and os.environ when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/transformerlab/plugin_sdk/plugin_harness.py`:
- Around line 110-123: In set_config_env_vars, the log messages use {'user' if
user_id else 'team'} which is incorrect when both user_id and team_id are None
(global config); compute a source string like source = 'user' if user_id else
'team' if team_id else 'global' and use that source variable in both the success
print (after setting os.environ[target_key]) and the exception warning so logs
correctly show 'user', 'team', or 'global'; reference set_config_env_vars,
target_key, get_db_config_value and os.environ when making the change.

@dadmobile
Copy link
Member

Diffusion works for me but training doesn't. I'm getting this error about FA3 in xformers for some reason. Let's make a decision at scrum

ImportError: /home/azureuser/.transformerlab/orgs/3c33c85b-628a-4ca8-93d3-b657cb7973b2/workspace/plugins/diffusion_trainer/venv/lib/python3.11/site-packages/xformers/flash_attn_3/_C.so: undefined symbol: torch_list_push_back

Copy link
Member

@dadmobile dadmobile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with this and will let @deep1401 make the call on what to do next.

Copy link
Member

@deep1401 deep1401 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't think this is usable for anyone with a smaller gpu. It crashes for me on a dataset of 5 images with 24gb VRAM. I was able to make FLUX run with sharding but instead of sharding, the easiest thing to do here is look at the VRAM management section here along with other recommendations: https://github.com/modelscope/DiffSynth-Studio/blob/main/docs/en/Model_Details/Z-Image.md

I would recommend we don't do any changes on this anymore. Lets close this and remake everything such that it just clones DiffSynth studio and executes their scripts directly. We can make a new PR after we move to local providers?

Tagging @dadmobile here for further opinions

from typing import Optional


def get_db_config_value(key: str, team_id: Optional[str] = None, user_id: Optional[str] = None) -> Optional[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the import from transformerlab.plugin and add the function here directly? Was there an issue?

if "ncclCommShrink" in str(e):
print(
"Detected CUDA/NCCL mismatch while importing torch. "
"Reinstall the plugin venv with a torch build matching this machine's CUDA runtime."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never face this issue since we do the base install right?

return None


def resolve_diffusion_model_reference(model: str) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we need/support this right now? diffusers itself requires model_index.json

# cache_key = get_pipeline_key(model, adaptor, is_img2img, is_inpainting)

with _PIPELINES_LOCK:
resolved_model = resolve_diffusion_model_reference(model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldnt need resolving right because the plugin_harness provides all info correctly and you wouldnt reach this stage if something was unresolved

@dadmobile
Copy link
Member

OK agreed. @ParamThakkar123 I know you did a tonne on this PR but let's take what we did from this and instead focus on making a task with DiffSynth on the new style tasks.

@dadmobile dadmobile closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants